42 research outputs found

    ACCOUNTING DATA RECOVERY. A CASE REPORT FROM INFN-T1

    Get PDF
    Starting from summer 2013, the amount of computational activity of the INFN-T1 centre reported by the official accounting web portal of the EGI community accounting.egi.eu, was found to be much lower than the real value. Deep investigation on the accounting system pointed out a number of subtle concurrent causes, whose effects dated back from May and were responsible for a loss of collected data records over a period of about 130 days. The ordinary recovery method would have required about one hundred days. A different solution had thus to be designed and implemented. Applying it on the involved set of raw log files (records, for an average production rate of jobs/day) required less than 4 hours to reconstruct the Grid accounting records. The propagation of these records through the usual dataflow up to the EGI portal was then a matter of a few days. This report describes the work done at INFN–T1 to achieve the aforementioned result. The procedure was then adopted to solve a similar problem affecting another site, INFN–PISA. This solution suggests a possible alternative accounting model and provided us with a deep insight on the most subtle aspects of this delicate subject

    JOB PACKING: OPTIMIZED CONFIGURATION FOR JOB SCHEDULING

    Get PDF
    The default behaviour of a batch system is to dispatch jobs to nodes having the lower value of some load index. Whilst this causes jobs to be equally distributed among all the nodes in the farm, there are cases when different types of behaviour may be desirable, such as having a completely full node before dispatching jobs to another one, or having similar jobs dispatched to nodes already running jobs of the same kind. This work defines the packing concept, different packing policies and useful metrics to evaluate how good the policy is. A simple farm simulator has been written to evaluate the expected impact on a farm of different packing policy. The simulator is run against a sequence of real jobs, whose parameters have been taken from the accounting database of INFN-Tier1. The effectiveness of two packing policies of interest, namely relaxed and exclusive, are compared. The exclusive policy proves to be better, at the cost of unused cores in the farm, whose number is estimated. The possibility of implementing the exclusive policy on a specific batch system, LSF 7.06, is exploited. Relevant configurations are shown and an overall description of the mechanism is presented

    changing the batch system in a tier 1 computing center why and how

    Get PDF
    At the Italian Tierl Center at CNAF we are evaluating the possibility to change the current production batch system. This activity is motivated mainly because we are looking for a more flexible licensing model as well as to avoid vendor lock-in. We performed a technology tracking exercise and among many possible solutions we chose to evaluate Grid Engine as an alternative because its adoption is increasing in the HEPiX community and because it's supported by the EMI middleware that we currently use on our computing farm. Another INFN site evaluated Slurm and we will compare our results in order to understand pros and cons of the two solutions. We will present the results of our evaluation of Grid Engine, in order to understand if it can fit the requirements of a Tier 1 center, compared to the solution we adopted long ago. We performed a survey and a critical re-evaluation of our farming infrastructure: many production softwares (accounting and monitoring on top of all) rely on our current solution and changing it required us to write new wrappers and adapt the infrastructure to the new system. We believe the results of this investigation can be very useful to other Tier-ls and Tier-2s centers in a similar situation, where the effort of switching may appear too hard to stand. We will provide guidelines in order to understand how difficult this operation can be and how long the change may take

    Improved Cloud resource allocation: how INDIGO-Datacloud is overcoming the current limitations in Cloud schedulers

    Get PDF
    Trabajo presentado a: 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP2016) 10–14 October 2016, San Francisco.Performing efficient resource provisioning is a fundamental aspect for any resource provider. Local Resource Management Systems (LRMS) have been used in data centers for decades in order to obtain the best usage of the resources, providing their fair usage and partitioning for the users. In contrast, current cloud schedulers are normally based on the immediate allocation of resources on a first-come, first-served basis, meaning that a request will fail if there are no resources (e.g. OpenStack) or it will be trivially queued ordered by entry time (e.g. OpenNebula). Moreover, these scheduling strategies are based on a static partitioning of the resources, meaning that existing quotas cannot be exceeded, even if there are idle resources allocated to other projects. This is a consequence of the fact that cloud instances are not associated with a maximum execution time and leads to a situation where the resources are under-utilized. These facts have been identified by the INDIGO-DataCloud project as being too simplistic for accommodating scientific workloads in an efficient way, leading to an underutilization of the resources, a non desirable situation in scientific data centers. In this work, we will present the work done in the scheduling area during the first year of the INDIGO project and the foreseen evolutions.The authors want to acknowledge the support of the INDIGO-DataCloud (grant number 653549) project, funded by the European Commission’s Horizon 2020 Framework Programme.Peer Reviewe

    A self-configuring control system for storage and computing departments at INFN-CNAF Tierl

    Get PDF
    The storage and farming departments at the INFN-CNAF Tier1[1] manage approximately thousands of computing nodes and several hundreds of servers that provides access to the disk and tape storage. In particular, the storage server machines should provide the following services: an efficient access to about 15 petabytes of disk space with different cluster of GPFS file system, the data transfers between LHC Tiers sites (Tier0, Tier1 and Tier2) via GridFTP cluster and Xrootd protocol and finally the writing and reading data operations on magnetic tape backend. One of the most important and essential point in order to get a reliable service is a control system that can warn if problems arise and which is able to perform automatic recovery operations in case of service interruptions or major failures. Moreover, during daily operations the configurations can change, i.e. if the GPFS cluster nodes roles can be modified and therefore the obsolete nodes must be removed from the control system production, and the new servers should be added to the ones that are already present. The manual management of all these changes is an operation that can be somewhat difficult in case of several changes, it can also take a long time and is easily subject to human error or misconfiguration. For these reasons we have developed a control system with the feature of self-configure itself if any change occurs. Currently, this system has been in production for about a year at the INFN-CNAF Tier1 with good results and hardly any major drawback. There are three major key points in this system. The first is a software configurator service (e.g. Quattor or Puppet) for the servers machines that we want to monitor with the control system; this service must ensure the presence of appropriate sensors and custom scripts on the nodes to check and should be able to install and update software packages on them. The second key element is a database containing information, according to a suitable format, on all the machines in production and able to provide for each of them the principal information such as the type of hardware, the network switch to which the machine is connected, if the machine is real (physical) or virtual, the possible hypervisor to which it belongs and so on. The last key point is a control system software (in our implementation we choose the Nagios software), capable of assessing the status of the servers and services, and that can attempt to restore the working state, restart or inhibit software services and send suitable alarm messages to the site administrators. The integration of these three elements was made by appropriate scripts and custom implementation that allow the self-configuration of the system according to a decisional logic and the whole combination of all the above-mentioned components will be deeply discussed in this paper

    Survey of the main causal agents of fusarium head blight of durum wheat around Bologna, northern Italy

    Get PDF
    Several Fusarium species and Microdochium nivale are involved in fusarium head blight (FHB), which in Italy has been constantly present on wheat since 1995. This study was carried out from 1995 to 2007 on FHB-infected durum wheat heads collected in the Bologna countryside, Emilia-Romagna, northern Italy. The most frequent Fusarium species found were: Fusarium graminearum (32.1%), F. culmorum (25.2%) and F. poae (17.8%), while F. avenaceum and M. nivale occurred discontinuously. Other Fusarium species were also found, but only sporadically. It is important to identify and characterize the main species involved in the FHB syndrome for this will help us to establish control strategies that will contain the disease and the content of mycotoxins in food and animal feed

    Implementation and use of a highly available and innovative IaaS solution: the Cloud Area Padovana

    Get PDF
    While in the business world the cloud paradigm is typically implemented purchasing resources and services from third party providers (e.g. Amazon), in the scientific environment there's usually the need of on-premises IaaS infrastructures which allow efficient usage of the hardware distributed among (and owned by) different scientific administrative domains. In addition, the requirement of open source adoption has led to the choice of products like OpenStack by many organizations. We describe a use case of the Italian National Institute for Nuclear Physics (INFN) which resulted in the implementation of a unique cloud service, called ’Cloud Area Padovana’, which encompasses resources spread over two different sites: the INFN Legnaro National Laboratories and the INFN Padova division. We describe how this IaaS has been implemented, which technologies have been adopted and how services have been configured in high-availability (HA) mode. We also discuss how identity and authorization management were implemented, adopting a widely accepted standard architecture based on SAML2 and OpenID: by leveraging the versatility of those standards the integration with authentication federations like IDEM was implemented. We also discuss some other innovative developments, such as a pluggable scheduler, implemented as an extension of the native OpenStack scheduler, which allows the allocation of resources according to a fair-share based model and which provides a persistent queuing mechanism for handling user requests that can not be immediately served. Tools, technologies, procedures used to install, configure, monitor, operate this cloud service are also discussed. Finally we present some examples that show how this IaaS infrastructure is being used

    Recommendations for radiation therapy in oligometastatic prostate cancer:An ESTRO-ACROP Delphi consensus

    Get PDF
    Background and purpose: Oligometastatic prostate cancer is a new and emerging treatment field with only few prospective randomized studies published so far. Despite the lack of strong level I evidence, metastasis-directed therapies (MDT) are widely used in clinical practice, mainly based on retrospective and small phase 2 studies and with a large difference across centers. Pending results of ongoing prospec-tive randomized trials, there is a clear need for more consistent treatment indications and radiotherapy practices.Material and methods: A European Society for Radiotherapy and Oncology (ESTRO) Guidelines Committee consisting of radiation oncologists' experts in prostate cancer was asked to answer a dedicated question-naire, including 41 questions on the main controversial issues with regard to oligometastatic prostate cancer.Results: The panel achieved consensus on patient selection and routine use of prostate-specific mem-brane antigen positron emission tomography (PSMA PET) imaging as preferred staging and restaging imaging. MDT strategies are recommended in the de novo oligometastatic, oligorecurrent and oligopro-gressive disease setting for nodal, bone and visceral metastases. Radiation therapy doses, volumes and techniques were discussed and commented.Conclusion: These recommendations have the purpose of providing standardization and consensus to optimize the radiotherapy treatment of oligometastatic prostate cancer until mature results of random-ized trials are available.AT would like to acknowledge the support of Cancer Research UK (C33589/A28284 and C7224/A28724) . This project represents independent research supported by the National Institute for Health research (NIHR) Biomedical Research Centre at The Royal Marsden NHS Foundation Trust and the Institute of Cancer Research, London. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care

    INDIGO-DataCloud: A data and computing platform to facilitate seamless access to e-infrastructures

    Get PDF
    This paper describes the achievements of the H2020 project INDIGO-DataCloud. The project has provided e-infrastructures with tools, applications and cloud framework enhancements to manage the demanding requirements of scientific communities, either locally or through enhanced interfaces. The middleware developed allows to federate hybrid resources, to easily write, port and run scientific applications to the cloud. In particular, we have extended existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT interfederation policies, thus guaranteeing transparency and trust in the provisioning of such services. Our middleware facilitates the execution of applications using containers on Cloud and Grid based infrastructures, as well as on HPC clusters. Our developments are freely downloadable as open source components, and are already being integrated into many scientific applications
    corecore